I have a copy of a Drupal 9 database from which I need to download all the pages.
I loaded the database in MySQL Workbench CE and connected to it via Python. There are many tables, but no views or stored procedures. I guess that some of those tables house the content, but I have no idea on how to pull them together to extract webpages.
block_content_body looks promising, but then what?
I assume it is a standard Drupal database. Is there a standard schema?
This what I have tried.
#connect to drupal db - done...
#copy to directory assigned.
#othercode config....
try:
# Define the tables and columns to extract HTML content from
tables_columns = [
('node_field_data', 'body_value'),
('block_content_field_data', 'body_value'),
('field_data_body', 'body_value'),
('field_data_[custom_field_name]', '[custom_field_column]'),
('paragraph__field_[custom_field_name]', '[custom_field_column]'),
('views_view', 'display_options')
# Add additional tables and columns as needed
# Iterate over the tables and columns and export HTML content to separate files
for table, column in tables_columns:
output_file = os.path.join(output_dir, f'{table}_{column}.html')
query = f"SELECT {column} FROM {table}"
cursor.execute(query)
rows = cursor.fetchall()
with open(output_file, 'w', encoding='utf-8') as file:
for row in rows:
html_content = row[0]
file.write(html_content + '\n')
print(f"HTML content from {table}.{column} exported to: {output_file}")
except mysql.connector.Error as error:
print(f"Error retrieving data from MySQL: {error}")
finally:
# Close the database connection
if connection.is_connected():
cursor.close()
connection.close()
I still have to correct the field names but guess these are the html files some how joined together here?
Do you have suggestions, or is this just crazy talk?